Universal file virtualization with disaggregated control plane, security plane and decentralized data plane

ABSTRACT

The present disclosure relates to Universal File Virtualization (UFV) that functions like a single virtual data hub spanning on-premise storage at various data silos, data centers cloud data resources stored in IaaS, PaaS and SaaS, remote office and branch office and hybrid-clouds primarily catering secondary data storage combining cyber resilience technologies, information security, file storage and object storage technologies. The proposed solution is built upon disaggregated control plane, security plane and decentralized data plane architecture. The system controller, security controller and Universal File System modules implement various file virtualization, security or data services algorithms to data that passes through it. The present disclosure also brings in a new concept called UFV, implementing a secure, UFS spanning all disparate data sources of a corporation distributed across geographies and cloud services, with centralized control plane, security plane and a decentralized data plane built out of secure vaults controlled by a data controller.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of U.S. application Ser. No.16/723,772, filed Dec. 20, 2019, which claims the benefit of priority toIndian Application No. 201841022971, filed Dec. 20, 2018, the contentsof each of which are hereby incorporated by reference herein in theirentirety.

TECHNICAL FIELD

The present disclosure is related to Information and Storage Security,Wide Area File Storage Virtualization and Data Protection. Invention isparticular focusing on cyber resilience and data protection aspects offragmented information systems in a global enterprise with differentforms of IT silos across on-Premise locations and cloud services.

BACKGROUND

Cloud computing, cloud storage networking are quickly becoming the wayInformation Technology (IT) and storage is delivered. With cloud scalestorage systems, customers can now take the advantage of various cloudresources, on demand, without spending an upfront investment. Vendorssuch as Amazon, Rackspace offers storage resources to customersdelivered on internet. Customers can now buy a minimal storage resourcein their own data center and can avail cloud storage as they added.

Cloud storage is very attractive for those customers who are on a lowbudget or those who cannot predict their storage demands or those whowant to store their mission critical data in well protected, SAS-7011type tier1data centers that they cannot altered otherwise. Cloud storagealso offer various cost advantages in terms of operational expenses, asthey don't need to spend dollars on managing, administering storagesystems.

As other conventional Wide area, distributed file systems are used forprimary storage use cases, distributed locking, concurrency control arebig challenges and make the file system deployment complex in amulti-data center, multi-location scenario.

In typical scenarios, a company with multiple sites, may allow users tostore data in multiple storage systems, store data in various cloudservices such as Google drive, Dropbox, while archived data may be ininfrastructure clouds such as Amazon Simple Storage Service (S3) orAzure or similar. File data may be in hosted servers such as in an IaaScloud or Software as a Service (SaaS) application stores. So, an IT headfaces new challenges for managing data in multiple storage silos, toenforce storage management policies, security controls, GDPR datacompliance requirements and a universal access and search, regardless ofwhere it is stored. New Cyber threats needs a data platform that providefinest visibility across all of their data assets, while actual datastores have to be protected and isolated from attacks like ransomwareand related cyber threats. In today's storage architecture, data istypically stored in a single location leaving the IT more vulnerable toransomware attack. If a single site is compromised, full data is lost.This is a single point of breach (SPOB) like very familiar Single pointof Failure (SPOF).

Clearly, a solution is needed for decoupling physical file storage, fromwhere the physical storage can be accessed and manipulated in alignmentwith business policies and also in the way data foundation is built.

When customers use many cloud storage providers, SaaS/IaaS/PaaS servicesand data in multiple locations, there is no mechanism to have a unifiedview of all storage that lives in all storage silos with a file systemlevel access semantics, and there are no benefits of virtualization thatspan across all such silos. There may be tools that bring together alldata at one place and provide access with a Graphical user interface.But a solid data platform, that provide a file system interface to user,with integrated file virtualization across disparate storage silos arethe clean gaps in the industry today. If the cloud provider goes down,or goes out of business, still data becomes unavailable. If cloudstorage is temporarily disconnected there has to be way for the hostbased applications or the local storage gateway based applications tocontinue functioning. Data may flow to cloud in an un-controlled manner,and need a way of data classification and then tier the data across.Applications may use traditional protocols like Network File System(NFS) or Common Internet File System (CIFS).

If the data is stored in public cloud storage, there has to be a way oftranslating conventional protocols to cloud Application ProgramInterface (API's)/semantics, so that customers can adopt cloud storagewithout disrupting their existing applications. Customer data may be athuge risk if all the data owned by cloud storage applications is storedin a single cloud that is owned by single administrative authority whichmay go out of business. There has to be a way for pooling storageresources across multiple providers, and gets it delivered to host orgateway based storage applications, in such a way that all the aboveproblems are eliminated. When cloud storage is accessed by a host, ifthe connection to cloud is lost, host should be able to do its job. In aconventional data protection infrastructure, there has to be a way ofautomatic scaling of data to cloud, transparently, without impactingapplications. Data should be virtualized across different storagesystems within a data center or across multiple cloud providers. So anautomatic integration of cloud storage into host, or data center, isrequired in such a way that cloud availability, security, or integrationshould not be an issue, to implement cloud-native, low cost and scalabledata protection environment, with intelligent file level storagevirtualization. Separate data silos can be protected, migrated, archivedthrough central data services controller which is also calledSD-Controller in this invention.

There have been many distributed file systems or wide area file systems.But it all runs in homogenous storage interfaces and protocols thoughmay be running in different Operating systems. All such file systemswere designed for a campus LAN and built before the era or public Cloud.None of this File system support dissimilar storage connectors. None ofthese file system has the concept of centralized security plane, controlplane with decentralized data plane architecture. Most of these filesystems are designed for primary storage use cases and do not have anybuilt in content analytics or data classification which can be applieduniversally across all data silos. None of the existing file systemshave the ability to integrate various systems data, at secondary storagelevel, based on the data criticality and security profiles across the ITsilos of a corporation. None of the existing file systems have theconcept of storage intrusion detection and prevention. None of theexisting file systems have the ability to tolerate single point ofattack or built before the era of ransomware. Existing storage systemslack data security as a foundation feature, though it offers mechanismsto use encryption or access control. None of the prior-art supportsecurity by design and default. None of the existing innovations has theability of system-defined architecture with a central controller,security controller and data controller all working independent ofactual user data location making it unsuitable to provide unified dataservices across disparate data silos.

SUMMARY

The present disclosure relates to universal file system which functionslike a single large system spanning on-premise storage at various sites,cloud services, cloud data resources stored in IaaS, PaaS and SaaS,Remote office and branch office and hybrid-clouds.

Universal File Services, Universal File Virtualization, in a Wide AreaNetwork (WAN) environment, spanning all data locations of a corporation,cloud provider or any form of IT organization including remote offices,branch offices, head quarter data centers, cloud servers and differentforms of cloud services including IaaS, PaaS and SaaS. Invention is alifeline in a GDPR (General Data Protection Regulation) compliant datastores, as there is a dire need for central data governance and datasecurity built-in by design. Cyber threats, the likes of ransomwarevirus, requires additional security for data stores, in-built dataservices, and a central control, which is realized through thisinvention. More particularly, embodiment of the invention also provide asecure way to integrate fragmented storage silos across disparatelocations deploying different kinds of storage systems using differentstorage protocols or storage interfaces. Embodiment of the inventionintegrate IaaS, PaaS, SaaS data stores, various locations and datacenters of a corporation, private cloud storage, public cloud storage,with intelligent, replicated metadata controllers, also known as systemcontrollers, in the middle acting as the central hub of intelligence,having separate security services monitoring every storage activity witha decentralized data plane. With the invention, actual location of thefile data at any location, any storage silo, any cloud is decoupled fromaccess semantics, with security by design and default tenet—realizing atruly Secure, Universal file virtualization across Wide Area Network.

Through the Universal File System interface, data located at any datasources owned by a corporation, can now be accessible as if, it islocated in the local drive of the PC of the user, sitting in thecorporate data center. “Universal” means “covering the entire datauniverse, be it remote office, branch, or clouds across different formsof a Wide Area Network. All “data universe” of a corporation is made assimple as a single “local drive” to a user or an administrator.Invention built upon a split control plane, security plane and dataplane architecture. The metadata controller and on-premise storagegateways implement various file storage virtualization or managementalgorithms to data that passes through it. All technologies are appliedacross various cloud providers, storage sites and cloud applicationsThis disclosure makes data at any storage sites, cloud service, cloudserver, branch office, remote office or any file at any app of acorporation, appear and accessible as if a local file system at anyon-premise controller. The present disclosure also brings in a newconcept, called “Universal File Virtualization” implementing a UniversalFile System with a centralized control plane and a decentralized dataplane backed by hybrid-clouds and or secure Data vaults allowing a datauser to access any file data located anywhere, be it in a remote officePC, branch office server, IaaS server, SaaS services, PaaS platforms,data is available is it's in the local drive of the user's PC, and cando whatever he used to with his local files making data control,visibility and security for data stored outside the corporate datacenter, simple and secure.

The present disclosure relates to a set of methods and architecture forimplementing s universal file Virtualization and also known as UniversalFile System(UFS) with various converged file services, having a singleglobal data fabric converging various file storage silos, with separatecontrol plane, security plane and a decentralized data plane, built upona set of Universal File Virtualization and data services methods, acrosson-Premise, IaaS, PaaS, SaaS data sources, hybrid-storage clouds with orwithout cyber-secured, secure Data Vaults”

Throughout the disclosure, invention may be referred as a UFS (UniversalFile System).

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of implementing a universal file storage wherean on-Premise gateway integrate data in different cloud services with acentral metadata controller (system controller), while actual data isstored in different storage clouds in accordance with some embodimentsof the present disclosure;

FIG. 2 is a block diagram of a 3-way, distributed file storage,implementing universal file storage, with 3 on-premise locations,without any cloud services, while a subset of the file data is in publiccloud storage services in accordance with some embodiments of thepresent disclosure;

FIG. 3 is a block diagram for implementing file storage virtualizationof data from Remote Offices, Branch Offices (ROBO) with multiple storagecloud storage systems in accordance with some embodiments of the presentdisclosure;

FIG. 4 is a system-defined, universal file storage systems encompassingthe data from Remote offices and Branch offices, cloud services, and 2on-premise gateways with metadata stored separately, while a subset ofthe file data are stored in dispersed, storage services in variouspublic clouds in accordance with some embodiments of the presentdisclosure;

FIG. 5 is another aspect of deployment diagram of various components ofthe invention with private and public clouds as well as different cloudservices; and

FIG. 6 explains in detail the aspects of UFS core module that presents afile system level interface to the IT, for all data stored outside thedata center.

FIG. 7 explains an embodiment with data containers, system controllerand security controller with disaggregated data exchange.

DETAILED DESCRIPTION

The foregoing description has broadly outlined the features andtechnical advantages of the present disclosure in order that thedetailed description of the disclosure that follows may be betterunderstood. It should be appreciated by those skilled in the art thatthe conception and specific embodiment disclosed may be readily utilizedas a basis for modifying or designing other structures for carrying outthe same purposes of the present disclosure. The novel features whichare believed to be characteristic of the disclosure, both as to itsorganization and method of operation, together with further objects andadvantages will be better understood from the following description whenconsidered in connection with the accompanying figures. It is to beexpressly understood, however, that each of the figures is provided forthe purpose of illustration and description only and is not intended asa definition of the limits of the present disclosure.

Universal File System, the current invention, also termed as UFS can beused as a file system with security by design, central control anddecentralized storage. While UFS can be used for the primary use cases,UFS is optimized and specially built, to work as a secondary storageplatform. As such need for complex locking or concurrency control ornetwork latencies are not important unlike traditional Wide Area FileSystem. As UFS is meant to use cases such as data governance, dataservices convergence, data security rather than application access, dataat actual sources, may not need to be up to date with what UFS exposethrough its file system interface or through its central systemcontroller interface. In this perspective, UFS can be considered as asecondary storage data virtualization, meant for data administrators,data protection officers, data compliance officers and data users, thanmeant to be consumed by an application such as database, needing primarystorage access experience. What makes the invention truly an industryexclusive is, its disaggregated control plane, data plane withdecentralized secure vaults, and security plane with converged metadata,security and data services. Invention uniquely combines data management,data protection, data control and visibility, storage security atsingle, virtual file security foundation.

The accompanying descriptions are for the purpose of providing thoroughexplanations, with numerous specific details. The field of cloudstorage/networked storage is so vast that many different variations ofthe described and illustrated inventions are possible. Manyimplementations are possible with ideas that can be derived from this,that match new protocols of storage or different data centerenvironment. Ideas or combination of sub sets of ideas described hereincan be applied to a corporate data center environment or a Local AreaNetwork (LAN) environment. The accompanying description is for thepurpose of providing a thorough explanation with numerous specificdetails. Of course, field of cloud and storage networking is such thatmany different variations of the illustrated and described features ofthe invention are possible. Those skilled in the art will thusundoubtedly appreciate that the invention can be practiced without somespecific details described below, and indeed will see that many othervariations and embodiments of the invention can be practiced while stillsatisfying its teachings and spirit. For example, although the presentdisclosure is described with reference to cloud storage, it cansimilarly be embodied in any form of utility/grid based storage clustersor data centers running various protocols including Internet SmallComputer System Interface (iSCSI), Fibre Channel over Internet protocol(FCIP), Cloud Data Management Interface (CDMI), Network Attached Storage(NAS), Hyper Text Transfer Protocol (HTTP), Structured Query Language(SQL) and Agile open source web development and E-commerce (AoE) etc.

The process features, or functions of the present invention can beimplemented by a computing device. As an example, computing devices mayinclude enterprise servers, application servers, work stations, personalcomputers, network computers, network appliances, personal digitalassistants, set-top boxes, and personal communication devices.

Definitions of Technical Terms Used

Cloud: is network or networked data center comprising a group ofcomputer, network and storage devices, running machine executableprogram instructions or storing or processing machine storable digitaldata. Data access is first received by the firewall, and thenapplication traffic is processed by the virtualization layer based onprocessing provisioning logic and billing information etc. The other keypart is virtualization layer that virtualizes physical resources. If itis cloud computing, this virtualization layer typically is a hypervisorlike Xen, Xen as VMware, if this is cloud storage, then this is a filevirtualization layer that virtualizes the underlying file servers likedenoted by 1006.

ROBO: ROBO stands for Remote Office, Branch Office. A typicalcorporation may have central site, regional headquarters, remote office,branch offices where employees may be working from.

File Servers: File server is a server machine where it runs a standardnetwork file access protocol like NFS (developed by SUN Microsystems) orCIFS (developed by Microsoft). File access is issued by any computerconnected to IP network, which performs file access over NFS/CIFSprotocol.

A proxy is also a computer system that intercepts some sort of trafficover the network, and does some processing and then ‘redirects therequest to another server, and receives the request back and sends backthe request back to the original client. In the context of invention,the proxy here intercepts all the traffic between the client and to adestination cloud, hence called cloud proxy.

Redundant Array of Inexpensive Disks (RAID): RAID is data protectiontechnology where different blocks of data are mirrored, stripped or perencoded, so that if any one or more disks fail, data is stillrecoverable. There are various types of RAID. RAID 0 is a simplestripping where different blocks of data are stripped into variousstrips and written into different disks. RAID 1 implements mirroring.RAID 5, 6 all involves using per encoding. There are other enhancementslike erasure-coded RAID in the literature.

Cloud Application Services versus Cloud Storage Services: CloudApplication services mean, services such as Google drive or Dropbox orbox.net, where users use it as part of an application, in most cases.For example, Dropbox storage is used as part of Dropbox file sharing andcollaboration tool. Google drive is used as part of Gmail. Similarly,various SaaS applications are used.

Cloud storage services, mean public storage clouds, meant for deliveringRAW storage in various forms. For example Amazon S3 is an object levelstorage service, where as it provide block service through Elastic Blockservice and compute services through EC2 etc. Other vendors offersimilar models. Typically, cloud storage application services, in turnuse public cloud storage services for final placement of user data.

Meta data Controller: System and computer systems, which are meant tostore, create, translate, process, communicate various forms ofintelligence, or data for controlling or changing the behavior of actualuser data.

Private, hybrid, public, Federal: Private cloud is a privateimplementation of an enterprise for its own use. It can also be hostedin a third party provider, but owned and managed by the customer. Publiccloud is hosted, owned and managed by the third party provider. Hybridand federated cloud is different amalgamation/union of private andpublic clouds in accordance with the policies of the providers involved.Hosted private cloud storage is dedicated, third-party managed cloudstorage, owned by customer or provider.

Cloud file is the file stored in cloud storage. Cloud filevirtualization involves virtualizing access to cloud file in a way thattransparently redirect the file access.

With respect to the use of substantially any plural and/or singularterms herein, those having skill in the art can translate from theplural to the singular and/or from the singular to the plural as isappropriate to the context and/or application. The varioussingular/plural permutations may be expressly set forth herein for sakeof clarity.

It will be understood by those within the art that, in general, termsused herein, and are generally intended as “open” terms (e.g., the term“including” should be interpreted as “including but not limited to,” theterm “having” should be interpreted as “having at least,” the term“includes” should be interpreted as “includes but is not limited to,”etc.). It will be further understood by those within the art that if aspecific number of an introduced claim recitation is intended. Forexample, as an aid to understanding, the detail description may containusage of the introductory phrases “at least one” and “one or more” tointroduce claim recitations. However, the use of such phrases should notbe construed to imply that the introduction of a claim recitation by theindefinite articles “a” or “an” limits any particular claim containingsuch introduced claim recitation to inventions containing only one suchrecitation, even when the same claim includes the introductory phrases“one or more” or “at least one” and indefinite articles such as “a” or“an” (e.g., “a” and/or “an” should typically be interpreted to mean “atleast one” or “one or more”); the same holds true for the use ofdefinite articles used to introduce claim recitations. In addition, evenif a specific number of an introduced claim recitation is explicitlyrecited, those skilled in the art will recognize that such recitationshould typically be interpreted to mean at least the recited number(e.g., the bare recitation of “two recitations,” without othermodifiers, typically means at least two recitations, or two or morerecitations).

While various aspects and embodiments have been disclosed herein, otheraspects and embodiments will be apparent to those skilled in the art.The various aspects and embodiments disclosed herein are for purposes ofillustration and are not intended to be limiting, with the true scopeand spirit being indicated by the above detailed description.

Some Technical Terms of the Invention are Described Below:

Data set: This is data layout, representing group of data bytes, storedin a computer file. It contains metadata, security data and actual data.Sometimes data set may contain only metadata. In some embodiments, itmay contain only security data encoding the access control attributes,permissions, user ids, security credentials, data classificationattributes of a file such as classified, public, confidential or userdata or metadata or in any combination. File metadata includeinformation for identifying the file, file ownerships, file locationsand so on and so forth. Various forms of data layouts can be used asdifferent forms of computer science data structures can be selected. Inan exemplary embodiment, it could be list of comma separated key, valuepairs for metadata. Metadata contains information such as presence ofmetadata; number of user files stored in this data set, location of theuser data in the file carrying the data set, location of the nextmetadata pointer, start of user data section, start of the security datasection and can contain more such security, metadata and file storageparameters. It's similar to a Zip file or Tar file, which contains themetadata for all member files, used for extracting individual files.

Agent module: This is a system that's installed in a PC which has systemprograms that can navigate file systems, looking up file changes,compare file change against normal changes or abnormal changes such asransomware activity. This agent system has the capability to packagemultiple files across different folders in a single data set and send tometadata controller or to data plane components for further processingand storage.

Ransomware attack signatures: Ransomware can encrypt a file. Thisequates to full file change. It can remove the contents. This equates todrastic file changes. Ransomware can rename files. This also equates todrastic data changes of the original file name. Ransomware can do dataexfiltration, which equates to huge data transfer across network. Allthese infection signatures can be used to detect any ransomware attackpattern.

Storage partitions: UFS has the built in data classification. This meansthat UFS understands the file classes and treat data accordinglyproviding different types of Quality of Service on data security andunderlying storage architecture. UFS in its global name space, allocatesvarious partitions to treat data according to its type and class. Forexample, Archives partition treat all data stored in it as long termArchives, UFS has a central GUI based configuration module which willtake input from data administrators on various data classificationparameters such content in user data, content in file names, ownershipand so on and so forth. UFS also supports versions. In one embodiment,UFS update every new data which is validated as good data to new versionof the storage epoch.

Wherever Ransomware is mentioned, it is equally applicable in differentforms with adaptations to other network worms as well.

Secure Vault, also described as Data Containers: It's very common tostore file level data in file systems or in object storage systems forscalability. Typical object server listens on an IP and a port which isaccessible from any network service. Data Containers (Secure vault) is acore part of the invention, adding secure network isolation capabilityto traditional object storage. Secure vault store data in the form ofimmutable objects while the system containing the objects does notlisten on an IP or a port. Using an ephemeral IP and port, it connectsto a component in the UFS module called data proxy, gets authenticatedthrough open SSL channel, and initiates a TCP connection. This dataProxy performs the role of synchronization of all data without needing aconnection initiation to secure vault. Data proxy is included in UFSmodule, System Controller and Security Controller for data communicationwith data containers Once TCP connection established, TCP client takesthe role as a server and flow of TCP stream is reversed. This way, onlytrusted service running in UFS module, can exchange data with securevault through this mechanism of reverse TCP flow, preventing ransomwareattacks to secure vault. As UFS modules, secure vault systems arecontinually monitored through security controller, ransomware attack isreduced even further. In some embodiments, data containers will byhybrid-cloud storage services or purely public cloud services. SecureVault or data containers can be built out of mix of on-premise vaultsand cloud services, forming a hybrid-cloud based secure data vault whichis connected by data controller to UFS.

Data controllers: Data controller is the interfacing services running asan independent system or as part of UFS module or system controller,depending upon the embodiment of the invention. Data controller,typically part of the UFS module, which is connected to data containersor secure vaults. Data controller implement data services, datadispersal using various forms of information theory such asreed-solomon, transform user data to object format and send thetransformed data to data containers.

SD controller: SD controller or System Defined Controller is a sub-unitintegrated in System Controller, taking configuration and managementdata from a data officer or administrator. SD controller further passesthis to system controller which re-distribute to security controller andUFS modules.

Data Plane: Data plane includes all components where user data is storedand received from.

Control Plane: Control plane include all components storing metadata,configuration data and management data. Metadata controller (systemcontroller) is the key part of control plane.

Security Plane: Security plane receive and store all security profiles,security configuration data and re-distribute to data containers, UFSmodules and System Controllers.

All file level data stored in end systems in remote offices, or inservers in branch offices or in HeadQuarter data centers or in SaaS dataservices, are consolidated by copying, backing up or through archiving.Such consolidated data is then stored in a decentralized datafoundation. In between, data may be transformed through encryption,compression, erasure coding and deduplication. These transformed datastreams are stored in cloud storage services or secure vault, in theform of object files. As source files are transformed into more than 2fragments in the form of object files, any data loss will not affect thedata availability. As individual fragmented objects are stored throughdata transformation with encryption and or erasure coding, individualfragment loss will not cause any data leaks. When fragment objects arestored in erasure coded, decentralized secure vault or across multiplecloud providers, ransomware attack is prevented to a near-impossibilityand also with improved cyber resilience as no complete piece of data isstored anywhere.

Referring to FIG. 1, 750 is a block diagram of the on-premise IT centerof a company, where 55-A inside the diagram is a system module runningin a PC that allows client machines labeled as PC1, PC2, PC3 and PC4connects with 55A over a standard NFS or CIFS protocol interface. 55A isalso defined as the on-premise gateway, as part of the invention.On-premise gateway extracts various data from 751, which is to beexplained shortly. Entities labeled as 55-C, 55-D and 55-e can bevarious cloud services such as Google apps or System as a Service (SaaS)s or hosted cloud servers. Through the cloud-provider supplied APIs, 751which is a metadata cloud storage service, make a copy or extractsufficient metadata, into a suitable storage medium in the metadatacloud storage which is embedded as part of the system. 751 then erasurecode the file data, and store different fragments to various publiccloud storage services such as Amazon S3 or Azure from Microsoft orsimilar cloud storage systems. Information to retrieve all these data,which is also known as metadata, is stored in 751, and is alsoreplicated to 752 for avoiding single point of failure. On-premisegateway, 750 “syncs” this metadata and the NFS or CIFS protocol service,and can now see all file objects stored in 50-c, 50-d or 50-e, as if itslocal. Data access results in an on-demand data transfer between 50-Aand public cloud storage services (50-f, 50-g and 50-h).

FIG. 2 discloses another exemplary embodiment of the present disclosure.1201, 1207 and 1204 are different on-premise locations having the samesetup of 750 of FIG. 1 . Each of 1201-1, 1207-1 and 1204-1 are syncingmetadata between the metadata cloud 1202, which is a centralizeddistributor of metadata and the data-routing proxy of actual data to andfrom the various cloud storage services (1205). As each of theon-premise gateways syncs metadata to this centralized metadatarepository, and full data is directly available in public cloud storageservices, all data and meta data are now available in every on-premisegateway. Hence, data captive in any location is available for access andview in every other location, while data is physically storedelsewhere—hence the universal file storage virtualization as thisinvention truly de-couples file access and file storage, in novel ways.1205 is the backup of metadata that's otherwise stored in 1202.

Referring to FIG. 3 , a multi-site storage integration with integrationof public cloud storage virtualization is shown. 40-C, 40-D and 40-E arebranch offices of a hypothetical enterprise, while 650 is ahead-quarter. 651 is a system of storage that stores all metadata andsome form of backup data. 40A in 650 is a system that is communicatingwith the agents installed in the Personal Computers (PC) (as labeled as001,002 etc.). 50-A, 50-B and 50-C are public cloud storage services,which stores dispersed data emanated from 40-A.

Referring to FIGS. 4, 1000 and 1001 are two on-premise locations while100 a and 100 b are public cloud storage applications such as Googleapps, sales force or similar services. 2001 and 2002 are the remoteoffice/branch offices running various system agents on the personal workstations/other personal devices such as smart phones. 3001 and 3002 areprimary metadata cloud and secondary metadata cloud. 2004, 2008 and 2012are various public cloud storage services. In other embodiment, numberof public storage clouds can be five or more, though only three servicesare shown as a minimum requirement of the invention. On-premise gatewaysintegrate file objects in Remote Office, Branch Office (ROBO) sites, andthen to Metadata clouds as explained above. 100A and 100 b alsointegrate data to metadata clouds. Metadata clouds in the middle act asthe central hub, of all information control and access.

Referring to FIGS. 4, 1000 a and 1000 b are the on-premise gatewayshaving an instance of metadata controller which serve files locally toall users mounted to the server over standard file access protocols(CIFS, NFS). It also receives the data over backup agents installed inROBO through its backup server. Backup server then translate all storagemetadata, extract data from the backup format, and re-integrate data andmetadata separately, in the format NFS and CIFS clients can access,while backup metadata is translated to the form of other metadatacontrollers, which can be accessed by the on-premise gateway, in a wayit can serve files over NAS protocols. As metadata and user data areseparately stored, data of different forms can be integrated and servedover NAS protocols as a universal file system. Similarly, data createdby a browser, uploaded in a cloud service portal, such as 3001, which isthe cloud service portal and metadata controller, can be integrated tothe universal file hierarchy, by normalizing user data and metadata intoa universal format, and delivered as a Universal file system. Similarly,data of other file systems can be combined. With the same idea extended,once Universal file system could extract user data and metadata from itsnative formats to a universal format, which is recognized by theuniversal file system, Universal file system can create a single logicalview and an interface to access and manipulate files, created by anyform of file service (such as NFS, CIFS, backup, archive, object, cloudservice, SaaS application, collaboration system, social websites,browser uploads, e-mails) running on any location, as a single large,file storage platform, accessible from anywhere.

Centralized business rules, have the ability to configure, change theway, metadata is distributed, normalized, integrated, and also data iscopied, backed up and migrated. Hence, system-defined control andprogrammability is achieved for universal file storage virtualization.Suitable APIs can invoke requests to hide, or change the way metadataand data are abstracted and exported/imported. Metadata synchronizationfrom the primary and central sites, and also with other sites isimplemented through transaction semantics. Referring to FIG. 4, 3001 canbe the primary site for metadata, while 3002 is secondary site and 1000b is the gateway that integrates the metadata to it. Data movement orreplication can happen from on-premise gateways to cloud and on-premisegateways to other on-premise gateways or through a central metadatacontroller (also referred as system controller) such as 3001.System-defined methods drive the way data is moved, replicated andbacked up. For example, system controls can be inserted to replicatecertain directories at certain sites to subsets of other sites, andcertain data from certain types of cloud services to be migrated andarchived while ROBO data to be backed up and replicated to DR sites.System controls can be placed to move certain types of data to bearchived, to public storage clouds, with erasure coding or replicationas appropriate. System-defined controls can be placed the way data isde-duplicated, that spans across multiple storage services and multiplesites and clouds. These are the methods we invented, to implementUniversal data management, driven by system-defined mechanisms, spanningmultiple types, storage sites and various cloud services of anenterprise. All the data management and file system can be invoked as asingle system, to realize a converged universal file system and datamanagement or universal data management can be implemented as astandalone system.

Universal File Virtualization includes primarily the ability of filebeing accessible from any location, regardless of where the file stored,as underlying storage here is made virtual. The data storagevirtualization further comprises the ability of files being migratedfrom one location to another location or to public storage clouds forarchiving or various data distribution reasons. Also, the data storagevirtualization comprises the ability of files being copied from onelocation to another or across the federation of storage clouds,transparently to the user, for backup or disaster protection purpose.UFS allows virtualizing secondary data also primary data thoughinvention is targeted primarily for the secondary storage market.

All functionalities are internally delivered by the central, metadatacontroller and on-premise gateway (an instance of the UFS module).Metadata controller (System controller) also processes user data, whichis moved to public storage clouds, primarily created at cloudapplication services or ROBOs. On-premise gateway systems can send userdata to public storage clouds directly or through metadata controller.

In an exemplary embodiment, the invention can appear as illustrated bythe FIG. 5 .

Referring to FIG. 5 , is a typical IT environment of a large enterprise,having multiple sites and data is scattered across various otherservices, branch locations or remote offices.11004 and 11005 are branchlocation with many IT equipment's, file servers housing hundreds orthousands of employees.11001 is a remote office.11006 is the corporatedata center. 11000 is the System Controller. 11000 can also be the Headquarter or a Cloud or could also be condensed as Virtual Machine andruns in corporate DC. 11003.11003 can be an application package whichcan also be run alongside 11001, or as part of 11001 or 156. 11002 isthe recovery location of the control plane. 11000 is the systemcontroller, running on primary mode of operation with a centralizedarchitecture. 11001 is the primary meta data controller or the .11003 isthe module that receives system instructions or system-defined methods.This could also be by way of simple provisioning of various dataservices, such as backup policies from which site to which site, ormigration of source data location to destination data location. Thisalso includes the data life cycle management policies of selecting thedestination clouds.

For example, a simple table entry could be to indicate that steer backuptraffic only to private clouds and archiving data to a set of predefinedpublic clouds. It also could add data classification and informationlife cycle management policies to determine the actual destinationclouds, all controlled by system defined constructs. Data classificationparameters also include the content type, strings contained in the filenames, owner of the files, type of data silos, type of the files etc. Asthe invention involve a novel architecture of centrally placed controlplane and decentralized data plane. The entire architecture isleveraging an all-new concept of split data and metadata architecture,which allows seamless integration of different data silos to realize theimplementation of the invention. Core idea of split metadata and data isto separate the actual location of the data from metadata, so that datasilos doesn't come in the way of file access. This way, Universal namespace is realized by the invention, as all metadata is centrallyintegrated, with all information to direct data access from differentforms of clouds from the novel implementation of hybrid-cloud system aspart of this invention. Referring to FIGS. 5, 11001 and 11002 are theprimary and secondary nodes of the metadata controllers, which are partof the centralized system controller (also referred as control plane),11000. All system components running in various PCs at 11007, Gatewaysystems in various data sites as in 11004, 11005 are part of source sideof data plane. Data plane also include various cloud servers as in 157and 158, as well as 159,160 and 161 which runs different cloud services.All the different data plane runs storage modules which are usingdifferent or same storage access protocols.11011 is the data controllerwhich executes instructions for data services and data transmission ofstoring received data from UFS modules. Data controller can be aseparate system, or can be an embedded part of UFS modules. Datacontroller is connected one or more cloud services which are privatecloud, public cloud or on-Premise storage vault. 167A is one suchprivate cloud services. These modules are also defined as the datacontainers.

11005 is the security controller, centrally monitoring every UFS modulesand also data containers attached to data controller. System controllerreceives security profile and security configuration data of varioussites and users which then pushed to security controller. Securityconfiguration can include the disabling of a UFS system, if a securitypolicy is set for that particular UFS module. As different data siloshave different types of data, security profile of each data source, canbe different based on the criticality and sensitivity of the data. UFSmodule when copy data to secondary storage from client systems, securityprofile of the data is learned also by the file extensions, filecontent, presence personally identifiable information etc.

In the context of the invention, Universal File Virtualization if notreal time, as invention is not applied to in-band production data. Soits storage migration from any source system is first migrated to cloudlayer, which is private or public cloud or in any combination. Allrecovery metadata is created at source storage system, instantlyreplicated to meta data controller in the control plane system. Variousnodes, wanting to offer Universal file virtualization capability thenredistribute the meta data from the central metadata plane. Withcompletely distributed meta data, on various source storage systems,which are distributed data plane, invention bring out the radicalarchitecture and method for Universal file virtualization. Referring toFIG. 5 again, 11004 and 11005 are two instances of distributed datasilos, in the exemplary representation a branch site. At site 11004,there are two NAS boxes 150 and 151. 152 at site 11004 is an instance ofthe distributed data plane module and also 11005 has the same role.11006 is the head quarter data center where 156 is a NAS basedinterface.153, 154,155 are also various forms of storage servers. 157and 158 are two forms of servers in the outsourced cloud provider datacenter. 159 and 160 are different physical or virtual machines havinggenerated data out of services or could be a SaaS based file servicessuch as Dropbox™ or Google Drive™. 161 is any entity having a datastored, owned by customer. Data can be created, at any of distributeddata plane.

At the employee PCs used at location 11007, files get created oruploaded. An installed system component then copy or migrate the filedata to a hybrid-cloud based architecture. 165,166,167 are public cloudservices and 165A, 166A and 167A are private cloud services. Data fromPCs at the site 11007 first arrive at ROBO module 201. 201 will extractthe file from the data stream, normalize the file path to a universalpath such a way that it can be referenced uniquely from any otherlocation, and then look up SD controller service profile, and pass downto data chunking and dispersal layer.11006 is the data dispersal layer,which is creating data chunks out of the file, either replicate or mixin with error correction codes such as reed Solomon based codes, orsimple XOR codes or any equivalent coding technique, as this inventioncan make use of any code or no code at all. Data is then converted toobjects, and each object is uniquely named, and steer to different cloudlocations. System definitions are inserted at SD controller (once datais properly placed, all such parameters such target cloud profile,source data location, source file path, recovery file path and targetcloud locations, which constitute the additional metadata. Thisadditional metadata is then stored in 11001, and replicated instantly to11001. These additional metadata is then redistributed to any otheron-Premise gateways, such as 152, labelled as GW in 11004 and also 164,labelled as GW in 11005. These gateways is running a uniquely built NFSserver, which has a split data and metadata plane architecture. Thisalso means that data and metadata do not need to be co-located.

In traditional file server, metadata and data of the file system beingserved is created from the same file system having data co-located inthe same storage volume in a single node or as part of different nodes.This will not allow the metadata update from central control plane.Hence as part of the invention, NFS server module, is completely builtwith metadata and data separation in mind, which runs in these gateways(152 and 164 running in sites 11004 and 11005 respectively. Data planeof this NFS module, also understand that data can be local or remotelystored in private or public clouds or on a mix of clouds when cloudprofile is hybrid-cloud. Once metadata is updated, any file which getscopied and migrated from 11007 is now accessible, manipulated, updatedin real time. This is possible as metadata is now available to the NFSmodule. Data will be retrieved by appropriate cloud APIs, brought to theGW (152 or 164) and data is delivered to data request clients accessingthe data across NAS protocol. Same access is possible, to gateway 164 aswell.

Consider now that data stored in proprietary vendor systems 150 and 151in the site 11004 and also 162 and 163 running in the site 11005. 150,151, 162 and 163 are NAS appliances or could be File servers, capable ofserving files to NAS clients. There could be many NAS clients. In theexemplary embodiment, only few clients are shown, labelled as PC.Additional component of the invention is a module, running in thegateways 152 and 164 that copy or migrate the data from these servers,after leaving a symbolic link in the aforementioned servers and repeatsthe steps as followed by the system when data is copied initially copiedfrom 11007. If the data is ingested from the gateway 152 in this manner,metadata doesn't need to be redistributed 152 as it will always have themetadata. But central control plane then will redistribute metadata to164 and 156. IT admin can access any of those files ingested to thesystem, can now be accessed from other gateways in the corporate datacenter (11006). Now as one part of the invention, data also can beingested from source data locations 157,158,159,160 or 161. All datawill be brought into cloud module first; cloud module will inspect theconfiguration data shared by System Controller, and create the datachunks, send the data to appropriate clouds, and feed new metadata to11001. This meta data controller will then resynchronize the meta datato all gateways and as done for other data sources. So, data from anysources within the enterprise, at any data silos, can now be availableuniversally. This is the core essence of Universal File Virtualization.

Also, data copies and migration and metadata resynchronization all areperformed as instructed by the SD controller. As the same technologybehind invention also applied as a data management for data stored fromany data, name of the invention can also be called universal datamanagement or universal file management.

Universal File Virtualization also provides a universal data fabric,converging all different data silos into a single local drive semantics.UFS module running in any data center can now access any data in any ofthe silos, be it in SaaS, IaaS, PaaS, remote office location, branchoffice, as a file system folder, and do anything with it that a user cando with a file system, it brings total control, visibility and overallsimplicity for the data infrastructure, without worrying for a singlepoint of failure, as data is decentralized with universalde-duplication, erasure coding/replication, while metadata is centrallyprotected with continuous data protection mechanisms with replication,corporations get an unprecedented data security and delivery experiencefor their un-structured, secondary storage systems.

11002, secondary metadata controller also provide recovery mechanisms,High Availability services for metadata, security monitoring servicesfor every gateway deployed in corporate Data centers, centralized logstorage of every system, centralized configuration management, andvarious forms of threat detection, authenticity checking and customertelemetry, providing another layer of security violation detection inthe context of cyber-security challenges. As security is part of the UFSmodule and also built as another layer for monitoring, securityfunctionality also executed in layers, and in different planes. This isanother novelty aspect of the invention, as there is no distributed filesystem, having considered security at all levels, though theyincorporate encryption and authentication, which are only the basicaspects of security control.

Referring to FIG. 5 , data distribution aspects of metadata plane isshown. Gateway systems (labeled as 152 and 164) are the gateways thathave the same shared data of the metadata plane. Metadata created at anysuch gateway is instantly synced central control plane primary node(labeled as 11001) which is mirrored in replica node. All these nodesare distributed at various data centers, separated by WAN or LAN links.For instance, if metadata module, running as part of gateway 152generates any new metadata, it will update the primary node of metadataplane, 11001. Primary node of metadata plane will then update the syncpending flags for other gateway, 164 which will subsequently sync themetadata changes back to their metadata module too. If primary node ofthe metadata plane, 11001 fails, secondary node 11002 can take over therole of primary and no disruption of service will happen.

In an embodiment, the metadata controller is an n-way distributedsystem, continuously replicating the changes from any site to allinstances of the metadata controller. An instance of the metadatacontroller is running as a part of on-premise gateway, while otherinstances are running in the cloud. In the invention, data and metadataare truly separated. Hence, intelligent system mechanisms can beemployed to drive the data movements across the federation of thestorage systems. File storage is truly de-coupled from where it'saccessed from, and who, by the methods driven by system controls throughthe metadata controller. ROBO data can be collected from various agentsystem running in the user systems, running in ROBO sites, which iscommunicated to any of the on-premise site, where the server system foragents are running, which receives all data, extract metadata, andtransform in some ways, and send to central metadata. On-premise gatewayalso runs file service as part of its components, which serve files tolocal site as well as distributing to other sites, through metadatacontroller. Data can be part of different application or differentstorage services, and have to be translated into a uniform format, sothat any file in any location, can be manipulated as a single large filesystem.

Hence, the present disclosure implements a Universal file system thatencompasses various storage sites, storage application services.Explaining further on uniform metadata format, consider a file isuploaded to a cloud service through a browser. Metadata can be veryminimal such as file name, size and source IP or user name. Consider thecase of storing the file from ROBO as part of an agent backup. Thenadditional metadata such as time and day of the backup, backup type,which needs to be translated to same form as a browser uploaded file.Similarly, when file is originally created by the file server, runningas part of on-premise gateway, file system specific metadata can betranslated to a convenient mechanism. Another example may be, if thefile is stored from a windows client, it has special parameters known aswindows Access Control Lists (ACLs), which are not created when a fileis migrated from a cloud service such as Google drive. Therefore, in thepresent disclosure, default values for different systems to interoperateare configured.

User can also login to a central portal, where the user can configurethe migration policies, which drives the data migration, as the dataages. It can be as simple as moving the file from G-drive to amazon S3after 6 months of inactivity, to migrate the data from every user onevery site and storage locations to multiple storage cloud servicesthrough information dispersal, if its older than one year. All themigration across the federation of storage clouds is automated as partof the universal file system. All metadata movement and data movement tomake physical file storage location transparent or truly virtualized, isautomated as part of the universal file system. The invention makesevery data in any location of storage silo as local to every othersystem and hence called universal.

Central metadata controller is an n-way distributed system, continuouslyreplicating the changes from any site to all instances of the metadatacontroller. An instance of the metadata controller is running as a partof on-premise gateway, while other instances are running in the cloud.In the invention, data and metadata are truly separated. Hence,intelligent system mechanisms can be employed to drive the datamovements across the federation of the storage systems. File storage istruly de-coupled from where it's accessed from, and who, by the methodsdriven by system controls through the metadata controller. ROBO data canbe collected from various agent system running in the user systems,running in ROBO sites, which is communicated to any of the on-premisesite, where the server system for agents are running, which receives alldata, extract metadata, and transform in some ways, and send to centralmetadata. On-premise gateway also runs file service as part of itscomponents, which serve files to local site as well as distributing toother sites, through metadata controller. Data can be part of differentapplication or different storage services, and have to be translatedinto a uniform format, so that any file in any location, can bemanipulated as a single large file system.

Hence, the present disclosure implements a Universal file system thatencompasses various storage sites, storage application services.Explaining further on uniform metadata format, consider a file isuploaded to a cloud service through a browser. Metadata can be veryminimal such as file name, size and source IP or user name. Consider thecase of storing the file from ROBO as part of an agent backup. Thenadditional metadata such as time and day of the backup, backup type,which needs to be translated to same form as a browser uploaded file.Similarly, when file is originally created by the file server, runningas part of on-premise gateway, file system specific metadata can betranslated to a convenient mechanism. Another example may be, if thefile is stored from a windows client, it has special parameters known aswindows Access Control Lists (ACLs), which are not created when a fileis migrated from a cloud service such as Google drive. Therefore, in thepresent disclosure, default values for different systems tointer-operate are configured.

User can also login to a central portal, which is part of the meta datacontroller where the user can configure the migration policies, whichdrives the data migration, as the data ages. It can be as simple asmoving the file from G-drive to amazon S3 after 6 months of inactivity,to migrate the data from every user on every site and storage locationsto multiple storage cloud services through information dispersal, if itsolder than one year. All the migration across the federation of storageclouds is automated as part of the universal file system. All metadatamovement and data movement to make physical file storage locationtransparent or truly virtualized, is automated as part of the universalfile system. The invention makes every data in any location of storagesilo as local to every other system and hence called universal.

Explaining further on the core invention, Universal File Virtualizationmethods and Universal File System (hereinafter referred as UFS), Referto the main FIG. 4 of the core architecture again. 11006 is a corporateData center, where the main data protection officer or CIO may operatesfrom, while 11000 is the central metadata controller which can be hostedin the cloud, hosted by a provider or hosted in company data center.Meta data controller by itself is self-protective as the primary metadata node, 11001 is replicated in real time to secondary metadata node11002. System-defined controller module 11003 can provide system-definedinstructions to both primary and secondary metadata node. When primarygoes off-line, secondary metadata node kicks in and takes over all theservices offered by the primary metadata node. External services cancontact secondary metadata node, in case it detects primary metadatanode went off-line. System defined instructions can be configured orexecuted by a Web based management GUI as an example. Various dataservices, security privileges, information regarding remote offices,branch offices, role based access controls, data sharing policies,security policies, data services policies can all be configured, whichgets translated as system defined rules to 11003, SD controller. SDcontroller will drive the movement of data and metadata, as well as thedelivery of various data services and data security services for theUniversal file system architecture and its various capabilities fordifferent use cases.

Cloud services interfacing module 200, will use appropriate cloud APIsto interact with different cloud services as shown in 157, 158,159,160or 161, on provider-specific APIs. This can be oAuth (OpenAuthentication) based G-Suite APIs to interface with google applicationsfor example. Open Authentication allows third party services access andmanipulate user data on behalf of its owners, once third party providersgranted data access. Every SaaS provider provides their specific APIs toread or update metadata or data to its services and data storage. Usinggoogle data API one can retrieve files and metadata of those filesstored in google drive, and BOX, a popular cloud based storage service,offer its APIs to access its data, so on and so forth.

Cloud service module thus brings in the data and metadata and also canupdate the data and metadata, as per the instructions from the SDcontroller, 11003. 11006 is a separate, data dispersal layer where indata can be grouped, partitioned, sharded or erasure coded and thentransmitted to different cloud providers or company owned private cloudor any combination of different clouds, forming a hybrid-cloudinfrastructure. Data can be split across multiple clouds, or differenttypes of data can be directed to different clouds or as desired by thecompany policies matching costs, security objectives and contracts setup. For example Amazon AWS™ has a special service for archival workloadsat very cheap cost, and it also offers another class of service foronline data at higher costs. SD controller can instruct all dataplacements in different tiers of cloud services offered by same provideror different cloud service providers as per the company policies.165,166,167 are different public cloud service providers, for exampleAWS, Google™ Cloud and Microsoft Azure™ while 165C, 166B and 167A areprivate clouds. Hybrid cloud is formed by combining all in differentcombination.

Like data from cloud services are ingested into the central controller,data can also be consolidated from different branch offices of thecompany, like as in 11007, 11005 and 11004. 11007 is a small office,where there is only few employees working with few PCs, which direct thedata and metadata to 201, that's part of the central system 11000. 201is also called ROBO module that process data and metadata streams fromthe agents installed in the different PCs running at the ROBO site11007. ROBO stands for (Remote Office, Branch Office). Similarly, anynumber of such Remote offices can be connected to 11000. Branch offices11005 and, 11004 also can send data and metadata to 11000 like how ROBOsites send the data streams of data and metadata. In a differentimplementation of this invention, all ROBO and Branch sites can senddata directly to data dispersal layer 11006, which then gets directed toactual cloud services. Data dispersal layer 11006 can be implemented asa standalone system, or as a library attached to any module running inany of the systems in any of the locations in the diagram or as anembedded system module which can be running as part of the data transferagents like 152 in the branch site 11004, or 164 in the branch site11005 or the system agents running in the PCs of the ROBO, like as in11007.

In such an embodiment, data will be directly transferred to public cloudservices, while metadata will be consolidated at the central metadatacontroller 11000. 11006 is the corporate Data center, where 156 is themodule that provide a file system level interface to every data assetsingested from all remote offices, branch offices and cloud services. 156has will contact 11001 for metadata updates, and 11002 if primary fails.It will have, in one embodiment, an embedded dispersal layer, having thesame functionality of 11006, and through which it will contact variouspublic cloud services to access data. The crux of the invention lies inthe combination of 156, that will create a file system level experienceto an end user, 11001/11002 centralized metadata controllers with realtime replication, and 152/164 example branch gateways that integratedata and metadata from the branches to the universal system. A filesystem like as in NTFS in Windows, allow a user to list directorycontents, access files, change files, copy files into and out of NTFSpartition. This is served by a module in the Windows kernel, whichstores actual user data in the disks, in different disk blocks. BeforeNTFS stores data, disk partitions are formatted to store metadata suchas Master blocks, file tables, locations of free blocks etc. Actual filecontents are stored in data blocks while metadata locates files likeMaster boot record, file allocation tables etc. also in separate diskblocks. This is fine for implementing in a disk based environment asthis is one form of implementing a file system based on technologieswithout cloud.

The invention creates the same final experience to user that stores allmetadata to access user data from the cloud, instead of disk-based datablocks. The module that runs in 156, can be an independent Linux systemor a Linux VM that can be accessed over a network, using an IP address.This system can also be mounted using any standard NAS protocol. Theinvention implement every file system requests, generated at OS layer,and translated to appropriate equivalent requests to cloud. 156 alwaysget updated on every metadata from the metadata controller. Actual datafrom the public cloud services will be accessed on-demand.

If for example, when a user do a directory listing, OS will translatethis to appropriate interface and pass the request to the file systeminterface of the module. 156 module runs itself as an NFS server,interfacing to OS through the kernel based NFS client. When NFS clientkernel module sends a readdir( ) RPC procedure, the NFS server, which is156 itself, will look up all the metadata stored that it assembled from11000, and construct the reply for readdir RPC reply, including all thefile names, file attributes, file sizes as required to enable the OSprovide reply to the end user. Similarly for any real time data updateon the file system, NFS client will send READ or WRITE RPCs, which getsinterpreted by the NFS server module of 156, and identify which filedata is needed from the RPC request based on the file handle informationand retrieve the correct file by translating the file to object requestbased on the metadata information. Its further to be noted that, dataflow or metadata flow can happen from any direction, with the exceptionthat metadata always go through central controller and thenre-distributed, while data can happen between data sources and cloudstorage services and or secure Vault, directly. User data also will gothrough central controller. Branch server resources can in some cases,send user data to other destinations through central metadata control.Central metadata control functions like a nervous system of all data andmetadata flow of a distributed corporation. Purpose of dis-aggregatingchannels for data exchange, control data or metadata exchange orsecurity data exchange is to enable communication from any to anyparadigm. As control plane, security plane and data plane aredisaggregated, now any UFS modules can send metadata to Systemcontroller and every other UFS module can receive it. As every UFSmodule send data to data containers every other UFS module can receivedata, if they have access to metadata. As UFS module runs a file systemwhich is configured to work with split data and metadata, data access isenabled like a file system on the local storage.

In one embodiment, when secondary metadata control plays the role as asecurity control point, it will monitor all systems, having dataresources, for any anomalies, corrupted files, malicious activities,virus checking, configuration file hardening and related securitymonitoring services can be performed, as a separate, security plane. Allcomponents in the UFS module, can get a gold copy of configurationfiles, security configuration for OS attributes, management data such asvarious services enabled for each UFS module and various identityverification services can be performed. Like in any standard system,security and management data will be set by a graphical user interfaceor through a command line interface at System Controller. SystemController then distributes it to UFS modules and security controller.This separate security plane also performs various security managementservices for cyber security protection reasons. This separate securityplane can also be called as cyber security automation center or SecurityPlane Controller or Security Operation Center or simply Security Vault.Security Vault constantly monitor every storage input and outputactivity going on in UFS modules as well as secure vaults. A securityadministrator can configure various policies and can instruct thesecurity controller to remotely shut down the systems having the datastored in secure vaults or at UFS hosts. This way, security vault offersthe capability of multi-site storage intrusion and detection, which isunheard of in the world of storage.

Referring to the FIG. 7 , central system controller F-03 is at thefulcrum of the invention. SD-controller, metadata controller and UFSmodule are the main components as per one embodiment of the invention.Metadata controller sends a backup of continuous data changes ofmetadata to security controller for High Availability and Disastertolerance. Every site of a corporation has site module of Universal FileSystem (UFS). Metadata controller also has a UFS module. UFS modulemainly performs the role as data connectors, connecting data sets fromdata sources and also making data available to a user as a file systemthrough various NAS(Network Attached Storage) protocols. UFS moduleconnects data through various SaaS provider API as well when data cannotbe accessed through NAS. Once data stream is received regardless of thetype of data connection interfaces, metadata is sent to metadatacontroller, data is extracted and sent to data containers storing userdata, shown as F-06, F-07, F-08, F-09, F10 in the figure. As F-06, F-07,F-08, F-09 and F-10 contain user data securely, these components arealso defined as data containers or secure vaults. Every instance of UFSand System controller (F-03) and Security controller (F-05) has thecapabilities of performing various data services such as data dispersaland data transformation to object format and send transformed user datato data containers. This logical part of all interfacing capabilities isdefined as data controller. System controller, through SD interface,configure various data containers attached the every data controllerinstance which is part of UFS and System controller. Data is exchangedfrom data controller, which is a logical module running in UFS module oras an integral part of System Controller. Data is sent through datapath, shown as data lanes; Metadata is exchanged across control path,shown as metadata lanes and all security management and automation isexchanged through security plane, shown as security lanes. Securityprofile of a user or data silo can be configured through a GUI(Graphical User Interface). For example, security profile of a data silocan be to Security controller has security configuration data and asecurity engine. Security Engine process all the security events datareceived at security controller through security lanes and determines ifthere is any anomaly. If anomaly found, security engine initiate realtime response. For example, in one embodiment security event may 3consecutive failure of authentication at any UFS module. SD controllermay have configured security response parameter as Remote SystemLockdown. This configuration is the security profile data associatedwith the data set. In this case further logins are disallowed.Similarly, if there is unauthorized storage resource access is observed,security controller will send the message to UFS module or secure vaultto shut down the system. This is akin to bring a new dimension to CIATriad to information security. In addition to Availability, Inventionbring to light that, Un-Availability to rogue users are also acapability of Information assurance platform.

UFS modules are not necessarily meant to be data connectors. It can alsoplay as primary data source itself. Client systems can directly mountUFS module as a virtual NAS system and can stored data at file systemsemantics. All datasets handling logic will be same.

SD controller can configure various file storage protection policies andparameters. It can set the number of secure vaults to 1, 2, 3 or more.It can also set cloud services similarly. In one embodiment, all dataobjects can be stored only public cloud services. In another embodiment,some objects can be stored in secure vault and some in public cloudservices. All such policies are configured and managed through systemdefined controller which then program the control plane instructing themetadata controller, data plane instructing the secure vaults and UFSmodules and security plane. As the architecture has the unique propertyof disaggregated control plane, data plane and security plane, securityservices are uniquely controlled through control plane, regardless ofwhere the data is stored. This also make it possible to integratedisparate storage protocols at different sites as well as different datatrapped in different sites, unified as a single, virtual universal Filesystem with security by design and default. Without a separate securityplane detached from control plane and data plane, such capability cannotbe built. Without a separate control plane, central control andvisibility cannot be achieved. Yet another property of data plane is itis decentralized. With decentralization, comes the capability—no singlepoint-of-breach or cyber-attack. The invention offers a novel way ofproviding true cyber resilience and protection from data thefts andbreaches with decentralized data plane, where every object is securelysplit pieces that stored in different data vaults located at differentlocations, with any single piece revealing no information or any singleloss of any piece having no impact to data availability.

Another salient feature of the invention is the way it preventsransomware impacted data sets with the known gold copy of the data. UFShas the concept of built in versions that is updated typically performedin every backup time. This is called backup epoch. In between everyepoch, new data is stored in a temporary partition. New data sets thensubjected to ransomware anomaly detection. Each file object is examinedfor the change against the previous file object. If any of the filechange meets the ransomware attack signatures, a real time are alert isgenerated and IT staff is engaged for manual verification and to matchdata validity parameters such as a subset of a file modification being anormal pattern. If verification fails, with no ransomware attacks, newdata is updated to old known copy. Otherwise, old good copy ispreserved. UFS keeps track of rich set of file versioning across datasilos which make it easy for an IT administrator to perform therecovery. Security control plane do real time ransomware attacksignature monitoring as well. Hence, ransomware attack is detected aspart of a new backup epoch update or through pro-active monitoringprocess. When every a new data is fail to match the ransomware attacksignatures, it will meet the data qualification. Data qualificationparameters can be set as frequency of data changes, amount of datachanges etc.

As user data fragmented according to information theory based on erasurecoding combining compression, encryption and deduplication, data isfurther optimized at compression level and deduplication level. Sincesecure vault is not listening on any IP or known port, network wormssuch as ransomware cannot penetrate to systems hosting secure vaultmodules. So, in a typical data flow, data gets ingested, packaged in adata set, sent to control plane for metadata processing, security planefor security data processing and data plane for file user data storageafter applying configured data services. On a data access, metadata anddata are separately extracted to provide local file system accesssemantics. As UFS is based upon split data plane, metadata plane andsecurity plane architecture, different data silos can be stitchedtogether even though user data is stored at a different location orconnected to end systems through different storage protocols. Ifsecurity plane is intrinsically part of a single location, it's verydifficult and complex to do security monitoring, security control onother UFS modules and secure vault. The combination of the disaggregatedarchitecture of control plane, decentralized data plane, security planewith converged data, metadata and security services, make UFS very noveland market-first in the context of data stored in different sites,clouds connected through different storage protocols.

Together, centralized metadata controller acts as a control plane,decentralized data plane that runs in various office locations, storagesilos in the clouds, and security plane running in the secondarymetadata controller or as a separate service running in a separate datacenter, UFS system becomes integrated, highly available with dataredundancy built in, and with total security services. As the targetuser data is stored across various cloud storage services, with erasurecoding or replication across, there is no vendor lock-in issues oroutage issues affecting the availability of user data when needed forrecovery.

UFS (for Universal File System, which is part of this invention) is nota file system for primary storage use cases and not invented for thatuse case. UFS provide a data platform for universal data governance,GDPR compliance, Cyber Security with a central control plane and adecentralized data plane, with split metadata and data planearchitecture. Actual user data is decentralized, as data is storedacross different cloud storage services in hybrid cloud architecture.Metadata is centrally synced the core UFS module. With all metadata atone place, data protection officer now experience a universal datavisibility and control. As user data is not at one place, data is betterprotected from cyber security related attacks. Storage can be dividedinto shards or erasure coded and resulting fragments can be sent todifferent cloud storage services like AWS, Google Cloud, Azure or inon-Premise based private cloud storage services.

Universal File System, as it decouples the file storage assets from itsactual location, it implements Universal File Virtualization, driven bysystem instructions, input by 11003, SD Controller.

As Universal File System can access any data, move any data from anylocation, Universal file system also make the data services virtual,meaning data can be backed up from any location by steering a copy toclouds, and same copy can be moved to archives by removing the primarycopy from the data sources, any file object can be migrated from anydata source to any other data source, like it move the data to cloud.Hence, a customer using UFS do not need to purchase separate system forbackup, cloud archiving, storage migration etc. With System definedcontrol plane, any data now can be shared with any other user having theaccess rights, allowing Universal file sharing. With Universal FileSystem, now any file data object can be searched, universally. WithUniversal file system, now any form of dark data can be discovered. Withthe help of Universal File system, any file having PersonallyIdentifiable (PII) content for sensitive data can be detected easily asmetadata gathers the information for files having sensitive data whichis available at the central controller for universal search.

All file system activity of Universal file system is securely audited.All audit logs are first sent to multiple data clouds in chunks, andthen SHA of every such audit chunks are stored in an immutable storagemedium such as Tapes or sent to popular e-mail systems or to Blockchainservice offering tamper proof storage endurance SLAs. Various forms ofcontent based search tools can be employed to detect sensitive files andcan apply this universally across the Universal file system. Universaldata fabric, which is the core premise of universal file system, givesunprecedented data privacy controls to user data, as it allows centralcontrol, ownership management of files. All data can be delegated tospecific user, based on ownership rights or Role based access control.Access rights can be revoked on business needs, all file activities canbe tracked, full life cycle management and end to end file securitypolicy management can be easily configured at central control plane.

Any file having any type of sensitive content can be detected by theUFS, no matter this file is stored. UFS module has a distributedsensitive data detection intelligence built in. As data in the cloudinfrastructure, be it in SaaS, PaaS or IaaS, all data can be fullycontrolled from the on-Premise gateway which can be running in anycustomer-owned data center. This capability provides an “outsourcestorage, without outsourcing data control” experience to customers.Without a universal data fabric, offering universal control andvisibility, no privacy and security controls can be enforced by the IT.IT administrative rights itself can be hierarchical. In the unlikelyevent that new breed virus such as ransomware, could enter into datahosts and when it tries to modify the file such as encrypting it forclaiming the ransom, encrypted file will just become another version, asoriginal version remains intact and tamper proof. Immutability was builtin at system level, which can be further verified by TPM (TrustedPlatform Module) or based on virtual TPM in an exemplary embodiment.

As Universal File system is fully driven by the System Definedinstructions, a data administrator can now Universal data servicesoperations, in a single scoop, such as removal of all files withextension.jpeg, or owned by user john, and it can be applied to all datasources. Similarly, through a single command from the SD controller,entire file data assets stored in all data sources can be backed up todispersed cloud services or secure vault, multiple data sources canselected and archived to dispersed cloud in a single work flow,simplifying file data management operations of a global corporationhaving data stored in different silos and various forms of cloudservices, such simplified mechanisms for universal data services arevery critical.

In additions to above data services, different variations of datagovernance, data forensics, cloud data life cycle services, cloud datastorage deduplication can be performed easily through such a Universalfile system, providing an overarching data flow and metadata flow pathways, allowing any data services, offered to any data object, with totaldecoupling experience of data object with data source. Core UFS hostagent module, 156, which is also an NFS served module is explained indetail below.

Referring to FIG. 6 with Labels 1A and 1B:

UFS host module, 1A runs in a Virtual Machine or in physical system on aLinux machine. User clients or data user can contact this system througha Samba Server or through NAS clients. If access is performed over NFSmounts, as shown by 10, all NAS protocol requests are transmitted to UFSmodule core which is the file system driver, 40 over NFS protocol. Ifaccess is performed over Samba server, which is mounted to a Windowsclient machine, Samba server can in turn host the storage through anin-kernel NFS client, which in turn redirects the request to UFS coremodule, over NFS interface. UFS core module stores data in storagepartition 51 and metadata in storage partition 50. Meta data alsoimplemented in flat files, so any file system folders can also be usedas storage partitions. Here Data and Metadata are stored in separatedirectory partitions or in separate file systems. 1B is the centralmetadata controller which keeps all consolidated metadata. M1 shown asthe data line connecting 1A and 1B indicate the metadata flow in bothdirections. Metadata flows from central controller to UFS host modulewhen there is a data update happens from other sources. Metadata flowsfrom host module to central controller, when data update happens at UFShost module itself. C1 indicate the actual user data flow from hostmodule through hybrid cloud storage layer through the TCP/IP stack ofthe system running the UFS module. Metadata controller will have one ormore metadata nodes for High Availability. 80 is the interface forproviding system defined data services.

When data is synced from the central metadata controller It also has allinformation such as the actual data source like which cloud service likeG-Suite, Box or so on and so forth. Or which remote office, like thelocation name, data owner in terms of universal owner ID. Every branch,or Remote offices and cloud services get registered in the centralcontrol plane 11000 and also unique user ids, gets created for everyuser in every type of service. The same user id, in whatever the form,gets embedded in the metadata, which is synced to 156. 156 also will bereferred as the Universal File System or Universal File Manager OR UFSmodule. As data is directed to various cloud services, and metadata isupdated to Universal file system is driven by the clear instructions atsystem level, from the module 11003, the Universal File System alsocalled as Universal File Virtualization system. Invention stores data inthe hybrid cloud layer with or without dispersal layer. And metadata isstored in 11000 and also in the VM or the Linux system running theUniversal File System. Universal File system can also be implemented inother Operating system like Windows or Mac OS as part of differentembodiment.

One example of universal ID can be the email address of the employee,which is unique across the organization. Universal file system, based onthe metadata, classify data in the file system in various folders,according the various types of the data sources. For example, G-Sute,Box, Dropbox, Remote office in London, South Africa, Branch in Londonwill be displayed as different folders. Data can be displayed indifferent forms as needed by the company. All data sources may send datastreams in different forms or interfaces. For example data from remoteoffices may send the streams in a tar file format, which then processedby the ROBO module in the central controller, split the data streams andmetadata streams and storing data to clouds, while metadata is syncedback to the UFS module running in the corporate data center. Similarlycloud data sources send data streams in different cloud interfaces,which then processed by the cloud module in the central controller,splitting data and metadata. Branch gateways 152 at 11004 (representingone of the branch site in one embodiment of the invention) and 164 at11005 also split data from the metadata and will be sending data asobject formats, using direct cloud APIs, like as in S3 or will besending data in object like format, similar to CDMI, to private storagecloud service hosted by the company. Branch gateways may be sendingmetadata to central controller, which then get synced to UFS module.

In essence, all metadata from all data sources will be consolidated atcentral controller, which is replicated to secondary, and then syncedback to UFS modules, such as in 156 running at the site 11006. There canbe one or more instances of the UFS modules. UFS modules, centralmetadata controller, Branch gateways, ROBO modules all communicate innetwork tolerant manner. In one embodiment, this can be provided bySD-WAN (System Defined Wide Area Network). SD-WAN controller can also behosted as part of central metadata controller, and can be in tandem withthe SD controller module as in 11003. In this case, UFS file deliverycan work like a SD-WAN native Wide Area File services experience tocustomer.

As new regulations like GDPR (General Data Protection Regulation)requires a universal visibility, control of all data regardless of thedata locations, this invention provide a unique benefit providing asingle source of truth for all data and a way to manage all file storageassets in a single scoop. As UFS bring all data at the data center, likea single logical drive, data protection officer having access to anyVirtual machine running the UFS module (Data controller node in GDPRparlance, can locate any data assets, and do any form of permissioncontrol, data controls, and data managements to these data, even whenthe actual data was stored through an outsourced storage services (dataprocessor in the GDPR parlance). Through the UFS modules, any data canbe deleted regardless of where its stored and can be locateduniversally. With UFS file system, data can now be deleted whetheractual data is stored G-suite, Box, or branch office data servers or inthe local data stores accessible to UFS module.

There are many distributed file systems, but all need the deployment ofvarious parts of the file systems, having same form of data sources andinterfaces. UFS allows heterogeneous data sources like system agents atRemote offices, oAuth based interfaces at cloud services, system agentsdeployed in cloud servers (as in 157 and 158 for hosted servers in thecloud). Invention connect all this disparate data sources, into onelogical drive, located at any part of the world, and operated bydifferent providers. This invention provides a new file system levelinterface, that can universally access, manipulate data stored any cloudservices, any SaaS services, any cloud based servers, any data centerbased servers as a single logical pool. Universal File system alsocontrolled and programmed by a system defined controller and has a splitdata and metadata plane architecture.

Universal module built upon an n-way distributed unit which is anotherkey aspect of the building block of the invention. Meta data module wasalso specifically built for file system metadata. Every file object willhave a unique number for its ancestral distance from the root of thetree, positional index of it with respect to other members at the samedistance, including the objects having different parents, and also thepositional index of the object with respect other members having thesame parent. All child members of the same parent are stored within thesame file, allowing locality reference on metadata lookup. As allmetadata is stored in flat files, managing the metadata is easier. Everymetadata write operation will also generate a Write ahead loggingjournal, which then synced back to the central metadata controller. Inthis way, regardless of where the IO operation happened at the datasources, data is steered to cloud storage services, while metadata iscentrally consolidated, which then resynced to the systems running theUFS host modules. At any given time, UFS system may not have the dataupon access by a user. UFS module however will transparently bring inuser data from the clouds on demand.

Many cloud users now get a fine data control experience as all data incloud based collaboration or SaaS services are now made available as ifit's in a local drive, through UFS. With this data control, now clouddata security is enhanced. All data in cloud services, SaaS services areotherwise integrated through various APIs offered by the provider andalways ended up in a separate data silo. Data in the remote offices alsoended up in a separate data silo, prior to this invention. UniversalFile system, converge all such data silos, as a single logical drive.

Single logical drive of the Universal File system will not store theactual user data other than the purpose of data processing. Once data isno longer is accessed beyond a threshold period, it is migrated back todecentralized, cloud storage layer. Every CIO or IT heads or Dataprotection officer, look for a central control and visibility for theiruniversal data distributed or fragmented across various storage silos.At the same time, they cannot centralize the actual user data as thatwill cause a single point of failure at infrastructure level. Whilecentralize data control and visibility, they strongly desire adecentralized storage layer for maximum data security and availability.Universal File System provide this unique benefit to market.

The invention thus brings out a novel file system, for universal datasources, which also implement a set of novel data services fullycontrolled by system defined user commands, truly realizing thepotential of system defined, Universal File services or Wide area fileservices and Universal file storage virtualization with the integrationof a federation of hybrid cloud storage infrastructure. Every UFS modulereceives configuration information from System controller to enablecertain data services or not. For example, at a particular data silo,data compression and data deduplication may be configured to be enabledand encryption and reed-solomon erasure coding to be disabled. And alsoconfigure the order of data services as compression first and thendeduplication. Data controller part of this UFS module, with thisconfiguration, only executes compression and deduplication in thisorder. UFS module will then send the transformed data, as binary objectsto data containers. UFS module will then send the metadata describingthe object id, object location and security configuration data, as toencryption being active, data services status to System controller.System Controller will re-distribute this to other UFS modules. On datarequests from UFS module, UFS module will apply data services in reverseorder.

Other aspect of the invention is, it stores user data in cyber secureddata vault hosted within the company premises, if not stored in publicstorage clouds again in a decentralized architecture, which is furtherreferred as Secure Vault, employing secure network isolation technologyto protect the data from cyber security related attacks. Such data Vaultwill typically be able to store up to 96 Terabytes per vault. The mainuniqueness of this Secure Data Vault, storing the user data portion ofthe Universal File System is that, Cyber-attacks like Ransomware viruscannot enter into data vault over a network connection, as there is notransport protocols connection allowed from any system in the network(LAN, WAN, MAN or from Cloud) to the data vault. Data Vault use aspecial technology, where data Vault itself decide to which system itcan get data from and send data to, and by itself connect to thatsystem, with a control connection initiated to another system.

Secure data vault employ special TCP connection setup and data transfertechnologies in such a way that, data can be synced from Universal Filesystem modules, Metadata controllers or Branch gateways, synchronouslyto the Data vault, without any TCP or any other transport connectionmade from external systems to data vault. Secure Vault achieves thiscapability by playing the role as a “Client” in Transport connectionphase, and giving “Server” role to other, selected system, which isidentified by the master controller node, running in the centralmetadata controller. On data transfer phase, secure Vault change itsrole from “Client” to Server, while external data sources change itsstate from “Server” to client, to be able to send to, and receive datafrom secure Vault synchronously. This transition is done right after theTCP three way handshakes is performed, and just before the data transferbegins by having secure Vault itself is waiting for data to arrive fromselected data sources. Additional control and monitoring intelligencewill detect if such external data sources are not in the approved listof data nodes that has the permissions to exchange data with secureVault. Additionally, Metadata controller node, running Machine Learningand AI based anomaly detection, behavioral data collection to detect ifany unwanted network data activity is taking place, to secure Vault,flagging the event as a potential attempt through cyber-attacks orRansomware activity.

Organizations always lose track of some types of data that they think itnever existed or forgotten the path names/locations etc. which aregenerally classified as dark data. Universal file system allows the dataprotection officer to search and locate files based on the path names,content, time its stored, source of data location, user id, businessevents, as UFS metadata has the capabilities to embed extra intelligenceto tag files based on above such parameters, and further allow lookupsbased on those parameters.

Another special feature of Universal File System is that, thesemechanisms further secure the decentralized data vaults from cybersecurity challenges or attacks like ransomware. There will be at leasttwo secure Vaults, if user data is not stored in clouds. User data ofthe Universal File System may be decentralized across any combination ofsecure Vault which runs in company premise, and across a pool of hybridcloud resources. When one of the data Vault is down, it doesn't affectthe data availability. Storage may be replicated or erasure coded acrossdata vaults. Data vaults may run on-Premise data centers or in ahybrid-cloud infrastructure.

Yet another aspect of the invention is that, every branch gateway, orUFS core module or the central metadata controller, has the ability tocompress, de-duplicate the data chunks across universal data sources. Asde-duplicated chunk hashes, which can be based on various generations ofmd5 or SHA based algorithms, are stored in central, highly availablemetadata controller, which can be retrieved by any node thatde-duplicates data, any duplicate chunk hash can be looked up by anynode which is part of the Universal file system. This aspect of thederived inventive method is otherwise not available in any distributedfile system. Additionally, de-duplicated chunk further stored in aredundant manner with reed-solomon based erasure coding technology. Thisway of implementing Universal file storage de-duplication as part of thefunctionality of a file system makes the Universal file system a bestfit for storing less active data sets, securely and with optimum storageutilization. Same data, in different file name in a cloud service likeG-Drive, Box, Dropbox, and a User PC in a remote office, or a file in aserver in the data center, will now reduce to a single, unique datablocks. Other Global de-duplication system does not have this abilityspanning across heterogeneous data silos, and also do not store thedata.

Core UFS module, 156 at the location labeled as 11006, can also berunning from any other location. Update on each gateway will result inother gateways being synced, in near-real time manner, controlled by thecentral SD-Controller (labeled as 11003). This way files can be shared,distributed or made available for global access across all locations ofthe company spread across Wide Area Network, giving the title of theinvention as a Secure, Wide Area File Services. All data services likebackup of the data at any location, migration of files between anylocation are centrally controlled by the system defined controller, thiscan also be looked at as Universal file system having system-defineddata services. All files at data source, de-duplicate the file with auniversal chunk data base, compress the file, encrypt it with a randomor user supplied key, then applied to erasure coding and send to securedata vaults or various clouds, all controlled by SD controller.

Various erasure coded fragments can be further directed across variousrouters spread across the universe across different paths, say one pathgoes through Atlantic, while the other path go through Pacific. In thisfashion, when file fragments are migrated, no man-in-the-middle attackercan access the data, which is any way encrypted and erasure coded, Asall data can be stored in secure data vaults which are isolated from anyin-bound network connections, data security at rest, transit, andnetwork attack levels are eliminated. Periodic data integrity checks areperformed universally with SHA checks, validating the integrity of thedata. Every file activity is centrally audited, with an optionalintegration with block chain, for tamper proof storage of file hashes.All these security mechanisms are otherwise, not available in any WANscale files system.

Detailed aspects of the security enforcement are applied to all dataassets, from the central metadata control which is also play the role asa security enforcement point. Security metrics include the type of thefile data that is further shown by its very type of it. For instance,.xls say that it's an XL based financial document, .cpp say that is asystem program written in C++ Language so on and so forth. Owner id ofthe file, source location of the file data, time file was ingested intothe UFS, and organization data governance policies as it required byvarious compliance regulations like GDPR, HIPPA, SOX, ISO etc. DataGovernance also includes data retention policies, archival media types,data access rights and various data control metrics. All these securityparameters are entered to metadata controller through SD-controllerinterface, Data governance requires interfaces to enter security andgovernance policies, system to store and retain the policies, and anability to apply to every file data assets centrally.

This invention make these tasks possible system as it has interface toreceive all security parameters through SD controller, can store andprotect these security parameters through metadata controller, and canaccess every file data assets centrally either through core UFS moduleat file system level, or through a graphical user interface running aspart of metadata controller. Graphical User Interface running in themetadata make file access possible over a graphical user interface. Whencertain operations are performed, such as changing the access rights orretention policies, it's distributed to all parts of the UFS system.Other than making all file data objects located at various,heterogeneous data sources, to one large, integrated file system, italso implement various data security services like data governance,central data security controls, integrated data protection and migrationservices as part of the overall system.

As explained, Universal file system thus not only provide file deliveryservice when a user access the file system, it also converge variousdata services like backup to cloud, archiving to cloud, storagemigration across locations, cloud storage life cycle management, datagovernance, universal data search, dark data discovery, universal filestorage de-duplication, secure data vaults, central control andvisibility, decentralized storage with built in redundancy all as asingle, converged solution. This speaks further to the novelty of theinvention.

UFS can optionally use block chain technologies to make tamper proof,file activity auditing. UFS anyway record every storage activity, ifconfigured and sent to security controller. As metadata controller andSD controller, and security services which are running as part ofmetadata controller, collect all activity logs, which are furtherdispersed to cloud storage services, and additionally SHA fingerprint ofthose file activity audit logs are stored in publicly available blockchain based distributed ledger, which is a tamper proof, distributeddatabase. Block chain Application programming Interface storage of datasecurely and without being tampered.

Other Distributed file system has the same interface to all location,such as file system mount in a local machine. Universal file system thatwe invented has dissimilar interfaces, such local file system mount inon-premise gateway, Google cloud APIs in G-DRIVE, backup agents inRemote sites and so on. When a file is viewed from other sites, a userthinks that the file is as if, it is created by local file system. Asanother copy of the data is available in other sites, or in the clouds,and also metadata is distributed with redundancy, universal file systemhas single point of failure. When a ROBO, logins to a central portal,which is running as a cloud service and uploads files through a browser,file is stored redundantly on multiple through data containers inon-Premise or clouds and metadata is synced across all metadatacontrollers. All files, though created through dissimilar interfaces,made to be uniform and local—hence the name Universal file storagevirtualization. Same technologies can be used to virtualize block levelor object level data as well. Instead of file metadata, block level orobject level metadata can be used in driving the storage migration andmovements across sites or clouds or across on-premise to clouds or fromclouds to on-premise.

In the explanations above, there are many detailed embodiments which canbe the derivative work. Metadata controllers, System Controllers,Security Controllers and Data controllers can be integrated in a singlesystem in at least one embodiment. Metadata controllers can be placedinside the on-premise as well or primary metadata controller withinon-Premise and secondary being operated in the cloud. Metadatacontrollers, receive various application requests, to align storagevirtualization according to policies. For example, an API can requestcertain data, owned or created by certain site or service, to be hiddenfrom other users, whereas a similar request can cause certain site datato be instantly replicated to public storage clouds, and so on. In theinvention, a processor executes one or more system mechanisms, toperform file storage virtualization.

Implementation Specific Details:

As the invention has many different forms of embodiments and differentcomponents can be grouped in different ways, implementation steps willbe different according to the specific embodiment. When source data iscollected at UFS module and sent to system controller for transmissionto security controller and data containers, security profile data willbe modified from the one set to next set as it traverse from UFS moduleto system controller. UFS module will construct a security profile asper the local knowledge of the data. For example, if UFS module isrunning as an agent module in client system, it may treat a file asunclassified file if this UFS module is not configured to receivesecurity configuration update from System controller. It will constructits security type as NORMAL and create data sets with the variousmetadata attributes of the file and send to system controller. Systemcontroller, having latest security configuration updates received fromthe user, can determine that its security policy is set as classified.It will then create a different set of security profile for this filebefore process it and send the data portion of it in object form afterperforming configured data services for the file in question. SecurityConfiguration and Security Profile are interchangeable in manyembodiments. In some cases, security profile is static securityconfiguration such as file types, file owner identification. Thesecurity profile also be based on provisioned data security servicesentered by the user through system controller. This could mean to turnon erasure coding and encryption or data auditing for this particularfile data. Security Configuration can be dynamic such based on filecontent. When UFS module, while creating data sets and performing dataservices, may learn that its content has sensitive data, it willdynamically update the security configuration of this file object andsend this information to security controller. Security controller willfurther redistribute the security profile, also known as securitypolicies and modified security configuration of the file object to otherUFS modules and Security Controller. Most situations, security profiledata remain same before performing data services and after performingdata services. On data access requests received at the configured UFSmodule, it has to look up the updated security profile of the fileobject first before trying to access the data. UFS module will fetch thelatest security profile from the security controller. This happens whendata protection officers or data officers may change the security accesscontrol credentials at any time through System controller. UFS providedifferent forms of data services to be applied to file objects in aunified manner, across data silos. System controller has to getconfiguration data for the services to be enabled at a specific UFSmodule or data silo. Services include data compression, data integritymonitoring, data activity monitoring, data auditing, erasure coding,compression, de-duplication, storage intrusion services encryption.Selected data services information, will be updated to every UFS moduleand security controller through system controller. UFS module in atleast one embodiment, maintain data files as objects in binary form in astorage media with versioning support. Whenever an object is updated, itreceives a new version. Old version becomes immutable and data objectsare stored as versioned, binary objects in the data containers. This isuseful to prevent data from cyber-attacks such as ransomware. User willenter data classification policies to indicate critical data sets. Onedata classification policy can be a list of strings contained in thefile name to indicate critical file. If the filename contains thisstring, its classified as critical. It will provide additional dataservices. Data administrator does not know how to differentiateransomware attacks. So, user can enter policies by which data changescan be qualified as good changes as opposed to changes due to networkworms. Similarly, policies for deciding a specific data as valid alsocan be entered into the system controller configuration data base. Oneexample of a valid data is file having a specific entry on a specificoffset. Similarly, qualification of a data change as good change ratherthan a change due to an attack can be, file modification at relatedregion of the file that had changes recently. For example, this could bea data base file getting update on similar regions due to a databasetable. Such criteria of data qualification parameters and data validityparameters are entered through UFS configuration unit or through Systemcontroller. When data is updated on the UFS module with qualified,validated change, UFS will update the versions, with an epoch change.This storage epoch change will advance the latest version as the mostupdated, gold copy of the file. If storage auditing is configured, UFSmodule will log every file system operation including the fileinformation and the user id performed the operation. As UFS is deployedas secondary storage platform, user id will be data management officer.UFS module can choose the data containers to send the data to, throughdata controller, UFS module has a configuration data base for allowingthe user to select the list of data containers as part of the datacontroller of the UFS module. One configuration can be 5 containers,wherein 3 containers are secure vault object in on-Premise and two datacontainers can be object storage services offered by third party cloudproviders, forming a hybrid-cloud storage architecture in adecentralized manner. It's decentralized as there is no sharing of datacontent across any data containers or no co-ordination needed amongstdata containers.

In normal operations, security controller keeps monitoring everyactivity going on UFS modules and on-Premise data containers through asecurity agent unit installed in the system running UFS module andsecure vault. System activity include number of processes running on thesystem, input and output activity on the system, cpu load on the systemso on and so forth. In at least one embodiment, Data containers orsecure vault is running in a system with no static IP configured.Security controller, System controller and UFS module as a unit, calleddata proxy to communicate to data container and can exchange commandssuch as heart beats, system data, uptime through send operations orreceive operations. During data send operations, data proxy will keepthe data in a queue and inform the data vault through heart beat anddata vault will pull it from data proxy.

Similarly, on receive operation, secure vault will send the data to dataproxy through similar heart beat mechanism.

Advantages of the Claimed Invention:

In one embodiment, the claimed invention helps in stitching together allfragmented data silos across various geographically distributed sitesacross different data centers and cloud services as a centrallycontrollable data hub through control plane capabilities, while actualdata storage is stored in decentralized data vaults through data planecapabilities for cyber resilience, with information security assurancedeeply integrated to data foundation through security planecapabilities. In some other embodiments, this invention underpins theproducts and technologies as a data governance platform which requiressecurity by design and automated capabilities of controlling andgoverning the data stored across various sites of the company indisparate storage systems and data silos, without actually making anychanges to primary storage platforms. In many embodiments, the inventionintroduce the first data platform with built in security and datamobility across sites, powered through the file virtualizationcapabilities delivered on secondary storage platforms. Unlike otherdistributed filesystems, UFS has the disaggregated control plane, dataplane and security plane architecture, making the unified delivery ofvariety of data management, data protection and data security services,based on global policies and data classifications, applied to datastorage independent of its locations. As UFS truly de-couple storage,access and security capabilities from its location this the best choiceto be used as a data governance solution or mass data fragmentationsolution without cyber threats. As invention converge all secondarystorage across data silos in one place, Chief information officers nowget a single pane of data access with central control, without worryingon single point of breach. UFS does not store full data of any file atany location in most embodiments. So data loss of subset of locationsreveals no information or loss no data with continuous securitymonitoring and storage activity surveillance. This makes Universal FileSystem an ideal choice for long term, secure archive use cases. AsUniversal File Virtualization is combined with data protection from allattached data silos, the invention is the first industry solution forproviding secure data management to various remote and branch offices ofa distributed enterprise. As UFS has content awareness and dataclassification built in with various data services like encryption,erasure coding, data activity auditing, ransomware attack mitigation,storage intrusion detection and active response which can be appliedacross multiple data repositories, UFS system provide the best choicefor storing sensitive and critical data sets like Defense, PublicSector, Financial institutions and Healthcare verticals. No existingtechnologies available to provide this technology as part of a filesystem.

In yet another embodiment, UFS provide immunity to quantum computingthreat to cryptography as there is no single piece of the data is storedin any single place. As UFS place user data in erasure coded datacontainers, storage security is based on information theory notcomputational which cannot be broken by crypto breaks through quantumcomputing.

1. A method for implementing storage intrusion detection and a real timeresponse system for a Universal File System (UFS) comprising adecentralized data plane, a system controller and a security controller,the method comprising: transferring data sets from a primary storageassociated with a plurality of storage systems to a set of securevaults; separating user data, metadata and security data from the datasets, wherein separating further comprises: transmitting the user datato a decentralized data plane through a predefined data path;transmitting the metadata to a system controller through a predefinedcontrol path; and transmitting the security data to a securitycontroller through a predefined security plane; separating and sendingstorage intrusion data, including ransomware attack signatures, in thedata sets to the security controller through the predefined securityplane, wherein the method further comprises performing, at the securitycontroller, at least one of: retrieving security configuration andsecurity policy data corresponding to the data sets from the systemcontroller; checking storage intrusion activities such as ransomwareattack signature; verifying the data qualification parameters withsecurity configuration data; effectuating a real time response tointrusion incidence, against an storage activity anomaly detected duringthe verification, in accordance with the security response parameters;and allowing a matched data to be stored in matched storage partitionsof the UFS if no storage activity anomaly is detected, wherein themethod further comprises performing, at the decentralized data plane, atleast one of: storing the user data as immutable objects; running as anindependent object storage system as part of third-party cloud storageservices or as an onPremise object storage system; responding to acommand request and a data request received from the securitycontroller; responding to the command request and the data requestreceived from system controller; responding to the command request andthe data request received from one or more configured UFS module;sharing the user data without a statically configured IP address andports with no network reachability to inbound network service and usingreverse TCP data flows for data exchange; and exchanging data with adata proxy, through send operation and receive operation over a reverseTCP flow, wherein the secure vaults store a redundantly coded, shardedfragments of the user data revealing no data for ransomware attacktolerance, need no open ports for in-bound connection requests or staticIP address, and wherein the security controller centrally monitors oneor more data input and output activities performed on the storagecontroller.
 2. The method of claim 1, wherein the security responseincludes at least one of disabling the UFS module from a further dataservice.
 3. The method of claim 1 further comprises implementing a goldcopy file system against ransomware attack, for a Universal File System(UFS) comprising a security controller functioning as a security planeand a centralized system controller having UFS modules configured toexecute a method comprising the steps of: receiving data sets from aplurality of data sources at a plurality of data silos; extractingmetadata, user data and security profile data at UFS modules;transferring metadata to a metadata controller; transferring securityprofile and security configuration data to a security controller,wherein a decentralized data plane associated with the UFS is configuredto execute a method comprising the steps of: storing user data asimmutable objects; responding to command and data requests from thesecurity controller; responding to command and data requests from thesystem controller; responding to command and data requests from the UFSmodules; initiating TCP connections with a data proxy; using reverse TCPdata flows for data exchange; transferring data from the data proxy overthe TCP connections, creating a backup epoch; and updating the gold copywith new epoch, after matching ransomware attack signature verificationto create the new epoch, in accordance with the data qualificationparameters, wherein the secure vaults provide no open ports for in-boundconnection requests or static IP address and use the reverse TCP dataflows to exchange data with the data proxy.
 4. The method of claim 3,further comprises implementing a ransomware resilient file systemsupporting multiple data sites, and integrated as universal file system,the method comprising steps of: receiving a security profile and asecurity configuration data from different sites; classifying the dataaccording to criticality and sensitivity of the data with predefineddata classification parameters; processing different data according to asecurity profile stored at the security controller; initiating theconfigured data services at the system controller; disallowing an updateof latest gold copy data with the new epoch, if the ransomware attacksignature verification succeeds; disabling the UFS module on matchingsecurity policy upon detecting an input/output anomaly as real-timeresponse, in accordance with the security profile data associated withthe data set; and sending a shutdown message to the UFS module and thesecurity vault from the security controller.
 5. The method of claim 4,wherein the UFS modules are located in different sites distributedacross a Wide Area Network (WAN).
 6. The method of claim 4, whereinintrusion responses can be different based upon security responseparameters and data classification configuration, which is centrallyenforced from the system controller and the security controller.
 7. Asystem for implementing a multi-silo data backup with a built-inransomware resilience, the system comprising a system controller, asecurity controller, a secure vault and UFS modules, the UFS modulesconfigured to execute a method comprising the steps of: receiving datasets from a plurality of data sources at a plurality of data silos;extracting metadata, user data and security profile data from thereceived data; transferring metadata to the system controller; andtransferring a security profile and a security configuration data to thesecurity controller, wherein the secure vault is configured to execute amethod comprising the steps of: storing user data as immutable objects;responding to command and data requests from the security controller;responding to command and data requests from the system controller;responding to command and data requests from the configured UFS modules;initiating TCP connections with a data proxy; using reverse TCP dataflows for the data exchange; transferring the data from the data proxyover the TCP flow, creating a backup epoch, updating known gold copywith new epoch after matching ransomware attack signature verificationto create the new epoch, in accordance with the data qualificationparameters, wherein the secure vault uses reverse TCP flow to exchangedata with the data proxy and the plurality of said UFS modules retrievethe metadata from a local storage, and the second set of user data fromthe plurality of secure vaults, associated with data controller, and thesecurity profile from the security controller, in response to receivinga data request from a user at second set of the plurality of UFS modulesrunning in second set of data silos.
 8. An architecture for implementingreal time intrusion response to storage systems across multiple-sites,comprising: a system controller; and UFS modules consisting of a dataproxy, a security controller and a decentralized data containersattached to a data controller, wherein the decentralized data containersare capable of executing data services and exchange data withthird-party cloud storage services, and configured to execute a methodcomprising the steps of: receiving a data synchronously with externaldata clients without any in-bound connection establishment; exchangingdata without any open ports for in-bound TCP/IP connection requests;initiating connections, and keep sending alive messages to the dataproxy; exchanging messages with the data proxy to initiate dataexchange; executing data receive operation using a reverse TCP flow;executing data send operation, using the reverse TCP flow; and storingdata in an immutable, versioned binary objects at data containers,wherein the data containers are connected to security controllerconfigured to execute a method comprising the steps of: receivingsecurity profile data from the system controller module; monitoring thedata activity operations on the plurality of configured data containersassociated with data controller; monitoring the data activity operationson the plurality of configured UFS modules; perform real-time ransomwareattack monitoring; extracting system activity events from the pluralityof UFS modules and the plurality of data containers; processing securityevents data coming in through security plane for detecting any anomaly,for triggering security response parameters; and initiating the attackresponse actions in accordance with security response parameters on theplurality of data containers associated with data controller, upondetecting any storage activity anomaly.
 9. The architecture of claim 8,wherein the system controller and the security controller are connectedto a plurality of data containers in a decentralized manner, while theuser data, metadata and the security data get transmitted over datapath, control path and security plane respectively, with security andmetadata distributed to the UFS modules across sites.
 10. Thearchitecture of claim 8, wherein the UFS modules retrieve the metadatafrom a local storage and a second set of user data from the plurality ofsecure vaults associated with data controller and the security profilefrom the security controller in response to receiving a data requestfrom a user at second set of the plurality of UFS modules running insecond set of data silos.